Green’s Bronze Mystery

The Dataset (Boring and Skippable)

I will be using a dataset I already have some familiarity with. Earlier this year a #TidyTuesday data visualization event used the 120 Years of Olympic History dataset, available on Kaggle and scraped from sports-reference.com.

This dataset is missing Tokyo and Paris. I think that that’s fine. Grabbing that data from elsewhere at a matching level of detail and integrating it into this dataset is something I could have done, but chose not to do because that flavor of work is a lot less fun than what I will be doing in the rest of this report.

I will not look at Olympics from before the 2nd World War, despite them being included at a wonderful level of detail within the dataset. This little story will be focusing on visual methods, and those do not get strictly better with more data points. Those older Olympics were set in a distinctly different world and the break the Olympics took during the war offers a wonderful, unambiguous cut to split the data on.

Let’s also clear up another small matter here. My definition of Europe for the purposes of this report are the current and historic official members of the European Olympic Committees, except for the Soviet Union and Russia.

Bonus: Let’s check an alternate explanation





Finally, let’s go for something that’s exciting and big and a bit clumsy. Let’s adjust the above figure to represent a fuller picture of these two Olympics.

(Warning: this part might look bad on some screen and browser setups)

Here we put what we already looked at on the Y-axis, and add all the other possible results on the X-axis. Note that we are looking at raw medals, rather than percentages, and thus we are back to being unable to compare Bronze counts to the other medals. I would also like to repeat another time that I needed to remove all team events for these last few figures.

With this many numbers the figure turns into a bit of a Choose-Your-Own-Adventure, though. It’s hard to pick which numbers you should be comparing.

But I believe that it’s actually a good thing to end this journey on the raw result, rather than the percentages. It’s important to remember what we are looking at here. Any athlete who made it into this data deserves to be treated as a story as well as a statistic. So here is a big, clumsy interactive figure where you can hover over points to see the person behind that result.

Go find some names you know and learn some new ones!